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Many systems, including digital signal processors, finite impulse response 
(FIR) filters, application-specific integrated circuits, and microprocessors, 
use multipliers. The demand for low power multipliers is gradually rising 
day by day in the current technological trend. In this study, we describe a 
4x4 Wallace multiplier based on a carry select adder (CSA) that uses less 
power and has a better power delay product than existing multipliers. 
HSPICE tool at 16 nm technology is used to simulate the results. 
In comparison to the traditional CSA-based multiplier, which has a power 
consumption of 1.7 uW and power delay product (PDP) of 57.3 fJ, 
the results demonstrate that the Wallace multiplier design employing CSA 
with first zero finding logic (FZF) logic has the lowest power consumption 
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of 1.4 uW and PDP of 27.5 fJ. 
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1. INTRODUCTION 

In order to get a longer battery life, researchers have developed low power technologies as a result 
of the widespread use of portable electronic equipments. The intricacy of the design, which in turn affects 
battery life and device size, is heavily impacted by a tradeoff between low power and high speed [1]. Since 
the multiplier is used in the majority of electronic systems for real-time applications including 
microprocessors, finite impulse response (FIR) filters, image processing, and digital signal processing, the 
performance of these applications is heavily dependent on the multiplier’s design [2]. It can be seen that the 
multiplier used for arithmetic operations like addition and multiplication contributes a significant portion of 
energy usage. Wallace tree multiplier is among the finest options with better power delay product (PDP) 
based on the currently available multipliers [3]. It is one of the options for usage in battery-operated systems 
because of its advantageous properties. 

The Wallace tree multiplier has three stages of action. In the first stage, partial products are generated 
using AND gates. N x N AND gates are needed for the N-bit multiplier. Using half adders and full adders or by 
hybrid full adders only, parallel column bits are added in the second stage [4] until ultimately just two rows are 
left. This is the tree reduction stage. The second stage requires a significant amount of additional time for 
addition, which increases the delay. At the third stage, fast adders are used to complete the final addition of the 
two rows [5]. A suitable option for final addition that can be utilized to reduce the multiplier’s delay is the carry 
select adder (CSA) [6]. Yet, using conventional CSA, a tradeoff of area can be seen. In light of this, carry select 
adder (CSA) with binary to excess-1 convertor (BEC-1) [7] is an additional choice for the final addition. 
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Different topologies can also be used for final addition, such as CSA with first zero finding logic (FZF) [8] 
and CSA with a combinational circuit. 

The algorithm utilized by the Wallace multiplier for multiplication is described in the next section. 
In section 3, method for design of conventional carry select adder (CSA) based Wallace multiplier, CSA with 
binary to excess-1 (BEC-1) convertor based multiplier, CSA with modified BEC-1 convertor based multiplier, 
multiplier using CSA with FZF logic and CSA using combinational circuit based multiplier is discussed. 
The findings and comparison of various suggested Wallace multipliers are presented in section 4. Section 5 
discusses the conclusion in detail. 


2. ALGORITHM USED FOR MULTIPLICATION 

The following is a basic explanation of how a Wallace tree multiplier multiplies a 4 bit number. 
It includes stages: stage-1 is the partial product generation stage, stage-2 is the tree reduction stage, and 
stage-3 is the final addition stage. 

a) Let’s assume that the multiplier b3 b2 b1 bO is to be applied to the multiplicand a3 a2 al a0. This can be 
accomplished by employing AND gates. It generates partial or multiplied products as PO, P1, P2, P3, 
P4, P5, P6, P7, P8, P9, P10, P11, P12, P13, P14, and P15 respectively [9] shown in Table 1. 

b) The first three rows of multiplied products mentioned in Table 1. Is to be further added using hybrid full 
adders. This is to be done in order to reduce it into two rows shown in Table 2. Parallel addition of 
column bits of first three rows is done: 

— Partial product PO is passed directly as the least significant bit (LSB) of sum bits. 

— PI and P4 is added to produce the sum bit SO and carry bit CO. 

— P2, P5 and P8 is added, generating the sum bit S1 and the carry bit C1. 

— Furthermore P3, P6 and P9 are added, creating the sum bit as S2 and the carry bit as C2. 
— Addition of P7 and P10 produces the sum bit as S3 and the carry bit as C3. 

— P11 is left for further addition. 

The generated results obtained from Table 2. Tree reduction stage-2 step-1 can be expressed as partial 
sum bits and internal carry bits. Partial sum is generated as: PO = a0b0, SO = P1 xor P4, S1 = P2 xor P5 xor P8, 
S2 = P3 xor P6 xor P9, S3 = P7 xor P10, P11 = a2b3. Internal carry is generated as CO, C1, C2 and C3 
respectively. The results obtained from Table 2. Tree reduction stage-2 step-1 and fourth row in Table 1 is 
further added using hybrid full adders. This results in further generation of partial sum bits and partial carry bits, 
shown in Table 3. 

The generated results obtained from Table 2. Tree reduction stage-2 step-2 can be expressed as: 
partial sum is generated as PO = a0b0, SO = P1 xor P4, S4 = S1 xor CO, S5 = S2 xor C1 xor a3b0, S5 = S2 xor 
Cl xor a3b0, S6 = S3 xor C2 xor a3bl, S7 = a2b3 xor C3 xor a3b2 and P15 = a3b3. Internal carry is 
generated as C4, C5, C6 and C7 respectively. 

c) The final addition of the last two rows uses any fast adder [10]. In the proposed work, carry select adder 
is used for final addition. The final generated result being sum bits beginning with the LSB and an most 
significant bit (MSB) bit serving as the carry bit shown in Table 4 as final addition stage-3. 


Table 1. Partial product generation stage-1 


Generated partial product Row 
P3 = a0 AND P2 = a0 AND Pl = a0 AND PO = a0 AND 1 
b3 = a0b3 b2 = a0b2 bl = a0b1 bO = a0b0 
P7 =al AND P6 =al AND P5 =al AND P4 =al AND 2 
b3 = a1b3 b2 = alb2 bl =albl b0 = alb0 
Pll=a2AND P10=a2 AND P9=a2 AND P8 = a2 AND 3 
b3 = a2b3 b2 = a2b2 bl= a2b1 b0 = a2b0 
P15=a3 AND Pl4=a3AND P13=a3AND P12=4a3 AND 4 
b3 = a3b3 b2 = a3b2 bl =a3bl b0 = a3b0 


Table 2. Tree reduction stage-2 step-1 
Tree reduction-step 1 
P11 =a2b3 S3=P7+P10 S2=P3+P6+P9 S1=P2+P5+P8 SO=P1+P4 PỌ =a0b0 
C3 C2 Cl Co 


The result generated with the final sum bits and output carry bit is expressed as: PO = a0b0, SO = P1 xor P4 
S4 = S1 xor CO, S8 = S5 xor C4, S9 = S6 xor C5 xor C8, S10 = S7 xor C6 xor C9 and S11 = P15 xor C7 xor 
C10. Internal carry is generated as C8, C9, C10 and final output carry is C11 respectively. 
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Table 3. Tree reduction stage-2 step-2 
Tree Reduction-step 2 using half adder and full adders 


Result obtained for addition of first three rows a2b3 S3 S2 S1 SO PO=a0b0 
C3 C2 Cl CO 

Fourth row a3b3 a3b2 a3bl a3b0 

Generated partial sum bits P15=a3b3 S7 S6 S5 S4 SO PO=a0b0 

Generated partial carry bit C7 C6 C5 C4 


Table 4. Final addition stage-3 
Final addition stage using half adder and full adders 
P15=a3b3 S7 S6 S5 S4 SO PO=a0b0 


C7 C6 C5 C4 
C10 Co C8 
Generated final sum bits S11 S10 S9 S8 S4 SO PO=a0b0 


Generated final carry bit C11 (cout) 


3. METHODOLOGY 
3.1. Wallace multiplier using conventional carry select adder-WM-design 1 

Wallace multiplier (WM)-design 1 consists of three stages as mentioned in the algorithm and is shown 
in Figure 1. The first two stages are designed using hybrid full adders only. Half adders used in conventional 
Wallace multiplier is replaced with hybrid full adder. Partial products generated in stage | are fed into stage 2, 
except PO = a0b0. PO is passed directly as least significant output sum bit. The output of stage 2 is fed as input 
into stage 3 for final addition. Output bits SO, S4, S8 obtained from stage-2 are also passed directly as output 
bits. In stage-3, carry select adder [11] is chosen for addition at the final stage. The design inside red color 
dotted line shown in the figure is the carry select adder which consists of two stage ripple carry adder. Ripple 
carry adder is designed using hybrid full adder [12]-[13] each consisting of 22 transistors. 
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Figure 1. WM-design-1 


3.2. Wallace multiplier using BEC-1 based carry select adder-WM-design 2 

In WM-design 2 shown in Figure 2, a modification is done in WM-design 1. Here, conventional 
carry select adder used in final addition stage-3 is replaced by BEC-1 converter based carry select adder. 
It consists of less number of logic gates [14], [15] which further results in low area utilization and low power 
consumption in comparison to the conventional carry select adder based multiplier [16]-[19]. BEC-1 logic 
circuit consists of inverter, XOR gates and AND gate. The circuit shown in the figure with the red dotted line 
is BEC-1 circuit. The output of the BEC-1 circuit is obtained through the multiplexer with C9 as the select 
line. Sum bit S10, S11 and carry output bit C11 is the output of the multiplexer. Other output bits are 
generated from stage-1 and stage-2 respectively. 


Implementation of FinFET technology based low power 4x4 Wallace tree multiplier using ... (Shikha Singh) 


1142 0 ISSN: 1693-6930 


b3 b2 bi b0 a3 a2 al al 


TES. -ET 


Partial Product Generation Stage 1 using AND gate 


Jr yiddild J 


a%b3 23b2 a2b3 alb3 a2b2 a3b1 a0b3 a1b2 a2b1 a3b0 a0b2 alb? a2b0 a0b1 a1b0 
I 
Hybrid Full Hybrid Full | galm 
Adder Adder 1 9 
I 
c2 s2 fjasbo | ct s | co I 
I 
Hybrid Full o! 
: ; ybrid Fu 
Hybrid Full Adder Hybrid Full Adder ‘Adder A 
' 
s6 is Í. c4 


Hybrid Full Hybrid Full 
Adder |" cg Adder 


Tree Reduction 
Stage 2 


Final Addition 
Stage 3 using BEC-1 based CSA 


0 


Cout(C11) 


Figure 2. WM-design 2 


3.3. Wallace multiplier using modified BEC-1 based CSA-WM-design 3 

In order to further minimize logic gates in the carry propagation path [20]-[23] for the proposed 
WM-design 2, modified BEC-1 based CSA logic circuit is proposed for Wallace multiplier, WM-design 3 
shown in Figure 3. We know that, the most significant output carry bit of the multiplier output is generated in 
the carry propagation path. Replacement of XOR gate with OR gate in the existing BEC-1 circuit discussed 
in WM-design 2, simplifies the logic function in the carry propagation path [24]. 

The major goal of the design is to minimise the number of basic logic gates at the most significant 
bit (MSB) position on the carry propagation path by considering all feasible input scenarios. Hence, 
minimizing area, and power consumption. In addition to this, the output obtain through the OR gate is exactly 
same as obtained through XOR gate in case of BEC-1 circuit used in WM-design 2. S10 and S11 bit of the 
result is generated using multiplexer shown in red colour dotted block in Figure 2. The output carry bit cout 
(C11) is generated using OR gate. PO = a0b0, SO, S4, S8 and S9 are generated using other two stages i.e., 
stage-1 and stage-2 respectively. 


3.4. Wallace multiplier using FZF logic based CSA-WM-design 4 

In WM-design 4 shown in Figure 4, for final addition stage, FZF circuit replaces second stage ripple 
carry adder and multiplexers present in conventional CSA based multiplier. FZF design helps in reducing 
area utilization and power consumption. $10, S11 and cout (C11) is the output through XOR gate of FZF 
logic circuit. Other output bits obtained through hybrid full adder are S8, S4, SO and LSB output bit a0b0. 


3.5. Wallace multiplier using D latch based CSA-WM-design 5 

In the proposed WM-design 5 shown in Figure 5, the second stage of carry select adder in the final 
addition stage of Wallace multiplier is replaced by a parallel structure of D latch [10]. D latch is used to store 
one bit of information with enable pin as a clock. Output is stored in D latch for enable pin = 1 and when it 
goes low en = 0, D latch output and hybrid full adder sum bit output is sent to the multiplexer. 
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Using a proper select line as a internal carry bit output (C8) from previous hybrid full adder, 
multiplexer output [25] is obtained as S9, S10 and S11 output sum bit and cout (C11) output carry bit. Other 
output bits are generated using hybrid full adders in the first two stages respectively. 
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Figure 5. WM-design 5 


4. RESULTS AND DISCUSSION 

Simulation is done using synopsys HSPICE tool at 16 nm FinFeT technology node. Predictive 
technology model multi gate (PTM-MG) library is used in order to obtain the results. Transient analysis is 
done in order to verify the output results. Average propagation delay, average power consumption, PDP and 
energy delay product (EDP) is calculated for the different proposed designs. Results are calculated for three 
different cases so that analysis of the circuits can be done at three extreme conditions. The three different 
cases for which the proposed designs are simulated is shown in Table 5. 

Table 6 shows the results obtained for proposed Wallace multiplier designs viz. WM-design 1, 
WM-design 2, WM-design 3, WM-design 4, and WM-design 5 at room temperature. The supply voltage is 
0.8 V. Parameters such as delay, power, PDP, and EDP is calculated for various designs. 

Table 7 shows the results obtained for proposed Wallace multiplier designs viz. WM-design 1, 
WM-design 2, WM-design 3, WM-design 4, and WM-design 5 at a temperature = —40 °C. The supply voltage is 
0.9 V. Parameters such as delay, power, PDP, and EDP is calculated for various designs. 

Table 8 shows the results obtained for proposed Wallace multiplier designs viz. WM-design 1, 
WM-design 2, WM-design 3, WM-design 4, and WM-design 5 at a temperature = 120 °C. The supply 
voltage is 0.7 V. Parameters such as delay, power, PDP, and EDP is calculated for various designs. 

Table 9 shows the comparative analysis of the proposed Wallace multiplier designs viz. WM-design 1, 
WM -design 2, WM-design 3, WM-design 4, and WM-design 5 with the existing multipliers. Average power 
consumption of the proposed designs is compared with the existing designs. Proposed designs shows better 
results with respect to the existing one. 


Table 5. Cases for simulation results 
Case Supply voltage Temperature 


1 0.8V 25°C 
2 0.9V -40 °C 
3 0.7V 120 °C 


Table 6. Case 1, simulated results obtained with supply voltage = 0.8 V and temperature = 25 °C 
Proposed design Delay (ns) Power (uW) PDP (fJ) EDP (fJ-ns) 


WM-design 1 34.5 1.7 57:3 1976.9 
WM-design 2 44.5 1.5 66.3 2950.4 
WM-design 3 25.1 1.5 36.9 926.2 
WM-design 4 20.2 1.4 27.5 555.5 
WM-design 5 54.8 1.6 85.5 4685.4 
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Table 7. Case 2, simulated results obtained with supply voltage = 0.9 V and temperature = -40 °C 
Proposed design Delay (ns) Power (uW) PDP (fJ) EDP (fJ-ns) 


WM-design 1 14.7 11.4 167.6 2462.3 
WM.-design 2 15.3 2.4 36.9 564.2 
WM-design 3 29.4 2.4 69.1 2031.2 
WM-design 4 20.2 2.2 44.0 889.5 
WM- design 5 34.6 2.6 89.6 3100.6 


Table 8. Case 3, simulated results obtained with supply voltage = 0.7 V and temperature = 120 °C 
Proposed design Delay (ns) _Power (uW) PDP (fJ) _ EDP (fJ-ns) 


WM-design 1 34.7 1.1 38.1 1322.1 
WM-design 2 44 0.93 40.9 1799.6 
WM-design 3 29.8 0.75 22.3 664.54 
WM-design 4 24.4 0.86 20.9 509.96 
WM-design 5 54.6 0.96 52.5 2866.5 


Table 9. Comparative analysis of the proposed designs with the existing work 


Wallace multiplier Power (mW) 
[7]-conventional 83.22 
[7]-using CSLA 82.88 

[7]-using CSLA with BEC 80.98 
[12] 0.694 

[13] 0.54 

[14] 0.082 

[15] 0.192 
WM.-design 1-proposed 0.0017 
WM.-design 2-proposed 0.0015 
WM.-design 3-proposed 0.0015 
WM.-design 4-proposed 0.0014 
WM.-design 5-proposed 0.0016 


CONCLUSION 
Multipliers are used for many applications such as in biomedical signal processing, portable 


electronic equipment. In this paper, various designs are proposed for the Wallace multiplier having low 
power consumption and improved PDP. It is concluded that Wallace multiplier (WM-design 4) designed 
using FZF based CSA shows better results in comparison to the other proposed designs. WM-design 4 has a 
PDP of 27.5 fJ which is 52% less and EDP of 555.5 fJ-ns which is 71.9% less than conventional CSA based 
Wallace multiplier (WM-design 1). However, Wallace multiplier (WM-design 2) designed using BEC-1 
based CSA and Wallace multiplier (WM-design 4) designed using D latch based CSA has more PDP and 
EDP in comparison to the other proposed designs. 
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